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APPLICATION FOR LETTERS PATENT 



Improved Image Retrieval Based On Relevance 

Feedback 



Inventor(s): 
Yong Rui 



ATTOimEY'S DOCKET NO. MS1-610US 



RELATED APPLICATIONS 

This application claims the benefit of U.S. Provisional Application No. 
60/153,730, filed September 13, 1999, entitled "MPEG-7 Enhanced Multimedia 
Access" to Yong Rui, Jonathan Gmdin, Anoop Gupta, and Liwei He, which is 
hereby incorporated by reference. 

TECHNICAL FIELD 

This invention relates to image storage and retrieval, and more particularly 
to retrieving images based on relevance feedback. 

BACKGROUND OF THE INVENTION 

Computer technology has advanced greatly in recent years, allowing the 
uses for computers to similarly grow. One such use is the storage of images. 
Databases of images that are accessible to computers are constantly expanding and 
cover a wide range of areas, including stock images that are made commercially 
available, images of art collections (e.g., by museums), etc. However, as the 
number of such images being stored has increased, so too has the difficulty in 
managing the retrieval of such images. Often times it is difficult for a user to 
search databases of such images to identify selected ones of the thousands of 
images that are available. 

One difficulty in searching image databases is the manner in which images 
are stored versus the manner in which people think about and view images. It is 
possible to extract various low-level features regarding images, such as the color 
of particular portions of an image and shapes identified within an image, and make 
those features available to an image search engine. However, people don't tend to 
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think of images using such low-level features. For example, a user that desires to 
retrieve images of brown dogs would typically not be willing and/or able to input 
search parameters identifying the necessary color codes and particular areas 
including those color codes, plus whatever low-level shape features are necessary 
to describe the shape of a dog in order to retrieve those images. Thus, there is 
currently a significant gap between the capabilities provided by image search 
engines and the usability desired by people using such engines. 

One solution is to provide a text-based description of images. In 
accordance with this solution, images are individually and manually categorized 
by people, and various descriptive words for each image are added to a database. 
For example, a picture of a brown dog licking a small boy's face may include key 
words such as dog, brown, child, laugh, humor, etc. There are, however, problems 
with this solution. One such problem is that it requires manual categorization - an 
individual(s) must take the time to look at a picture, decide which key words to 
include for the picture, and record those key words. Another problem is that such 
a process is subjective. People tend to view images in different ways, viewing 
shapes, colors, and other features differently. With such a manual process, the key 
words will be skewed towards the way the individual cataloging the images views 
the images, and thus different from the way many other people will view the 
images. 

The invention described below addresses these disadvantages, providing for 
improved image retrieval based on relevance feedback. 

SUMMARY OF THE INVENTION 

Improved image retrieval based on relevance feedback is described herein. 
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According to one aspect, a hierarchical (per-feature) approach is used in 
comparing images. Multiple query vectors are generated for an initial image by 
extracting multiple low-level features from the initial image. When determining 
how closely a particular image in an image collection matches that initial image, a 
distance is calculated between the query vectors and corresponding low-level 
feature vectors extracted from the particular image. Once these individual 
distances are calculated, they are combined to generate an overall distance that 
represents how closely the two images match. 

According to another aspect, when a set of potentially relevant images are 
presented to a user, the user is given the opportunity to provide feedback regarding 
the relevancy of the individual images in the set. This relevancy feedback is then 
used to generate a new set of potentially relevant images for presentation to the 
user. The relevancy feedback is used to influence the generation of the query 
vector, influence the weights assigned to individual distances between query 
vectors and feature vectors when generating an overall distance, and to influence 
the determination of the distances between the query vectors and the feature 
vectors. 

According to another aspect, the calculation of a distance between a query 
vector and a feature vector involves the use of a matrix to weight the individual 
vector elements. The type of matrix used varies dynamically based on the number 
of images for which feedback has been received from the user and the number of 
feature elements in the feature vector. If the number of images for which feedback 
has been received is less than the number of feature elements, then a diagonal 
matrix is used (which assigns weights to the individual vector elements in the 
distance calculation). However, if the number of images for which feedback has 
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been received equals or exceeds the number of feature elements, then a full matrix 
is used (which transforms the low-level features of the query vector and the 
feature vector to a higher level feature space, as well as assigns weights to the 
individual transformed elements in the distance calculation). 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention is illustrated by way of example and not limitation in 
the figures of the accompanying drawings. The same numbers are used 
throughout the figures to reference like components and/or features. 

Fig. 1 is a block diagram illustrating an exemplary network environment 
such as may be used in accordance with certain embodiments of the invention. 

Fig. 2 illustrates an example of a suitable operating environment in which 
the invention may be implemented. 

Fig. 3 is a block diagram illustrating an exemplary image retrieval 
architecture in accordance with certain embodiments of the invention. 

Fig. 4 is a flowchart illustrating an exemplary process, from the perspective 
of a client, for using relevance feedback to retrieve images. 

Fig. 5 is a flowchart illustrating an exemplary process, from the perspective 
of an image server, for using relevance feedback to retrieve images. 

DETAILED DESCRIPTION 

Fig. 1 is a block diagram illustrating an exemplary network environment 
such as may be used in accordance with certain embodiments of the invention. In 
the network environment 100 of Fig, 1, an image server 102 is coupled to one or 
more image collections 104, Each image collection stores one or more images of 
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a wide variety of types. In one implementation, the images are still images, 
although it is to be appreciated that other types of images can also be used with the 
invention. For example, each frame of moving video can be treated as a single 
still image. Image collections 104 may be coupled directly to image server 102, 
incorporated into image server 102, or altematively indirectly coupled to image 
server 102 such as via a network 106. 

Also coupled to image server 102 is one or more client devices 108, Client 
devices 108 may be coupled to image server 102 directly or altematively indirectly 
(such as via network 106). Image server 102 acts as an interface between chents 
108 and image collections 104. Image server 102 allows clients 108 to retrieve 
images from image collections 104 and render those images. Users of clients 108 
can then input relevance feedback, which is returned to image server 102 and used 
to refine the image retrieval process, as discussed in more detail below. 

Network 106 represents any of a wide variety of wired and/or wireless 
networks, including public and/or private networks (such as the Internet, local area 
networks (LANs), wide area networks (WANs), etc.). A cHent 108, image server 
102, or image collection 104 can be coupled to network 106 in any of a wide 
variety of conventional manners, such as wired or wireless modems, direct 
network connections, etc. 

Communication among devices coupled to network 106 can be 
accomplished using one or more protocols. In one implementation, network 106 
includes the Internet. Information is communicated among devices coupled to the 
Intemet using, for example, the well-known Hypertext Transfer Protocol (HTTP), 
although other protocols (either pubUc and/or proprietary) could altematively be 
used. 
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Fig. 2 illustrates an example of a suitable operating environment in which 
the invention may be implemented. The illustrated operating environment is only 
one example of a suitable operating environment and is not intended to suggest 
any limitation as to the scope of use or functionality of the invention. Other well 
known computing systems, environments, and/or configurations that may be 
suitable for use with the invention include, but are not limited to, personal 
computers, server computers, hand-held or laptop devices, multiprocessor systems, 
microprocessor-based systems, progranmiable consumer electronics (e.g., digital 
video recorders), gaming consoles, cellular telephones, network PCs, 
minicomputers, mainframe computers, distributed computing environments that 
include any of the above systems or devices, and the like. 

Fig. 2 shows a general example of a computer 142 that can be used in 
accordance with the invention. Computer 142 is shown as an example of a 
computer that can perform the functions of client 108 or server 102 of Fig. 1. 
Computer 142 includes one or more processors or processing units 144, a system 
memory 146, and a bus 148 that couples various system components including the 
system memory 146 to processors 144. 

The bus 148 represents one or more of any of several types of bus 
structures, including a memory bus or memory controller, a peripheral bus, an 
accelerated graphics port, and a processor or local bus using any of a variety of 
bus architectures. The system memory 146 includes read only memory (ROM) 
150 and random access memory (RAM) 152. A basic input/output system (BIOS) 
154, containing the basic routines that help to transfer information between 
elements within computer 142, such as during start-up, is stored in ROM 150. 
Computer 142 further includes a hard disk drive 156 for reading from and writing 



Lee & Hayes. PLLC 



6 



MS1-610.PAT APP DOC 



1 

2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 



to a hard disk, not shown, connected to bus 148 via a hard disk drive interface 157 
(e.g., a SCSI, ATA, or other type of interface); a magnetic disk drive 158 for 
reading from and writing to a removable magnetic disk 160, connected to bus 148 
via a magnetic disk drive interface 161; and an optical disk drive 162 for reading 
from and/or writing to a removable optical disk 164 such as a CD ROM, DVD, or 
other optical media, connected to bus 148 via an optical drive interface 165. The 
drives and their associated computer-readable media provide nonvolatile storage 
of computer readable instructions, data structures, program modules and other data 
for computer 142. Although the exemplary environment described herein employs 
a hard disk, a removable magnetic disk 160 and a removable optical disk 164, it 
will be appreciated by those skilled in the art that other types of computer readable 
media which can store data that is accessible by a computer, such as magnetic 
cassettes, flash memory cards, random access memories (RAMs), read only 
memories (ROM), and the like, may also be used in the exemplary operating 
environment. 

A number of program modules may be stored on the hard disk, magnetic 
disk 160, optical disk 164, ROM 150, or RAM 152, including an operating system 
170, one or more application programs 172, other program modules 174, and 
program data 176. A user may enter commands and information into computer 
142 through input devices such as keyboard 178 and pointing device 180. Other 
input devices (not shown) may include a microphone, joystick, game pad, satellite 
dish, scanner, or the like. These and other input devices are connected to the 
processing unit 144 through an interface 168 that is coupled to the system bus 
(e.g., a serial port interface, a parallel port interface, a universal serial bus (USB) 
interface, etc.). A monitor 184 or other type of display device is also connected to 
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the system bus 148 via an interface, such as a video adapter 186. In addition to the 
monitor, personal computers typically include other peripheral output devices (not 
shown) such as speakers and printers. 

Computer 142 operates in a networked environment using logical 
connections to one or more remote computers, such as a remote computer 188. 
The remote computer 188 may be another personal computer, a server, a router, a 
network PC, a peer device or other common network node, and typically includes 
many or all of the elements described above relative to computer 142, although 
only a memory storage device 190 has been illustrated in Fig. 2. The logical 
connections depicted in Fig, 2 include a local area network (LAN) 192 and a wide 
area network (WAN) 194. Such networking environments are commonplace in 
offices, enterprise-wide computer networks, intranets, and the Intemet. In certain 
embodiments of the invention, computer 142 executes an Intemet Web browser 
program (which may optionally be integrated into the operating system 170) such 
as the "Intemet Explorer" Web browser manufactured and distributed by 
Microsoft Corporation of Redmond, Washington. 

When used in a LAN networking environment, computer 142 is connected 
to the local network 192 through a network interface or adapter 196. When used 
in a WAN networking environment, computer 142 typically includes a modem 198 
or other means for establishing communications over the wide area network 194, 
such as the Intemet. The modem 198, which may be intemal or extemal, is 
connected to the system bus 148 via a serial port interface 168. In a networked 
environment, program modules depicted relative to the personal computer 142, or 
portions thereof, may be stored in the remote memory storage device. It will be 
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appreciated that the network connections shown are exemplary and other means of 
estabUshing a communications link between the computers may be used. 

Computer 142 also includes a broadcast tuner 200. Broadcast tuner 200 
receives broadcast signals either directly (e.g., analog or digital cable 
transmissions fed directly into tuner 200) or via a reception device (e.g., via 
antenna 110 or satellite dish 114 of Fig. 1). 

Computer 142 typically includes at least some form of computer readable 
media. Computer readable media can be any available media that can be accessed 
by computer 142. By way of example, and not limitation, computer readable 
media may comprise computer storage media and communication media. 
Computer storage media includes volatile and nonvolatile, removable and non- 
removable media implemented in any method or technology for storage of 
information such as computer readable instructions, data structures, program 
modules or other data. Computer storage media includes, but is not limited to, 
RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, 
digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic 
tape, magnetic disk storage or other magnetic storage devices, or any other media 
which can be used to store the desired information and which can be accessed by 
computer 142. Communication media typically embodies computer readable 
instructions, data structures, program modules or other data in a modulated data 
signal such as a carrier wave or other transport mechanism and includes any 
information delivery media. The term "modulated data signal" means a signal that 
has one or more of its characteristics set or changed in such a manner as to encode 
information in the signal. By way of example, and not limitation, communication 
media includes wired media such as wired network or direct-wired connection, 
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and wireless media such as acoustic, RF, infrared and other wireless media. 
Combinations of any of the above should also be included within the scope of 
computer readable media. 

The invention has been described in part in the general context of 
computer-executable instructions, such as program modules, executed by one or 
more computers or other devices. Generally, program modules include routines, 
programs, objects, components, data structures, etc. that perform particular tasks 
or implement particular abstract data types. Typically the functionality of the 
program modules may be combined or distributed as desired in various 
embodiments. 

For purposes of illustration, programs and other executable program 
components such as the operating system are illustrated herein as discrete blocks, 
although it is recognized that such programs and components reside at various 
times in different storage components of the computer, and are executed by the 
data processor(s) of the computer. 

Alternatively, the invention may be implemented in hardware or a 
combination of hardware, software, and/or firmware. For example, one or more 
application specific integrated circuits (ASICs) could be designed or programmed 
to carry out the invention. 

Fig. 3 is a block diagram illustrating an exemplary image retrieval 
architecture in accordance with certain embodiments of the invention. The image 
retrieval architecture 220 illustrated in Fig. 3 is implemented, for example, in an 
image server 102 of Fig. 1. Architecture 220 includes a query vector generator 
222, a comparator 224, multiple images 226 and corresponding low-level image 
features 228, and an image retriever 230. 
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Multiple low-level features are extracted for each image 226. These 
features are described as being extracted prior to the image retrieval process 
discussed herein, although the features could alternatively be extracted during the 
image retrieval process. Each feature is a vector (referred to as a feature vector) 
that includes multiple feature elements. The number of feature elements in a 
feature vector can vary on a per- feature basis. 

Low-level image features 228 can include any of a wide variety of 
conventional features, such as: color moment features, color histogram features, 
wavelet texture features, Fourier descriptor features, water-fill edge features, etc. 
In one implementation, low-level features 228 include three features: a color 
moments feature, a wavelet based texture feature, and a water-fill edge feature. 
The color moments feature is a 6-element vector obtained by extracting the mean 
and standard deviation from three color channels in the HSV (hue, saturation, 
value) color space. The wavelet based texture feature is a 10-element vector 
obtained by a wavelet filter bank decomposing the image into 10 de-correlated 
sub-bands, with each sub-band capturing the characteristics of a certain scale and 
orientation of the original image. The standard deviation of the wavelet 
coefficients for each sub-band is extracted, and these standard deviations used as 
the elements of the feature vector. The water- fill edge feature is an 18-element 
vector that is obtained by extracting 18 different elements from the edge maps: 
the maximum filling time and associated fork count, the maximum fork count and 
associated filing time, the filling time histogram for each of seven bins (ranges of 
values), and the fork count histogram for each of seven bins. For additional 
information regarding the water-fill edge feature can be found in Xiang Sean 
Zhou, Yong Rui, and Thomas S, Huang, "Water-Filling: A Novel Way for Image 
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Structural Feature Extraction", Proc. of IEEE International Conference on Image 
Processing, Kobe, Japan, October 1999, which is hereby incorporated by 
reference. 

Low-level image features 228 can be stored and made accessible in any of a 
wide variety of formats. In one implementation, the low-level features 228 are 
generated and stored in accordance with the MPEG-7 (Moving Pictures Expert 
Group) format. The MPEG-7 format standardizes a set of Descriptors (Ds) that 
can be used to describe various types of multimedia content, as well as a set of 
Description Schemes (DSs) to specify the structure of the Ds and their 
relationship. In MPEG-7, the individual features 228 are each described as one or 
more Descriptors, and the combination of features is described as a Description 
Scheme. 

During the image retrieval process, search criteria in the forma of an initial 
image selection 232 is input to query vector generator 222. The initial image 
selection 232 can be in any of a wide variety of forms. For example, the initial 
image may be an image chosen from images 226 in accordance with some other 
retrieval process (e.g., based on a descriptive keyword search), the image may be 
an image that belongs to the user and is not included in images 226, etc. The 
initial selection 232 may or may not include low-level features for the image. If 
low-level features that will be used by comparator 224 are not included, then those 
low-level features are generated by query vector generator 222 based on initial 
selection 232 in a conventional manner. Note that these may be the same features 
as low-level image features 228, or alternatively a subset of the features 228. 
However, if the low-level features are already included, then query vector 
generator 222 need not generate them. Regardless of whether generator 222 
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generates the low-level features for initial image selection 232, these low-level 
features are output by query vector generator 222 as query vectors 234, 

Comparator 224 performs an image comparison based on the low-level 
image features 228 and the query vectors 234. This comparison includes possibly 
mapping both the low-level image features 228 and the query vectors 234 to a 
higher level feature space and determining how closely the transformed (mapped) 
features and query vectors match. An identification 236 of a set of potentially 
relevant images is then output by comparator 224 to image retriever 230. The 
potentially relevant images are those images that comparator 224 determines have 
low-level image features 228 most closely matching the query vectors. Retriever 
230 obtains the identified images from images 226 and returns those images to the 
requestor (e.g., a client 108 of Fig. 1) as potentially relevant images 238. 

A user is then able to provide relevance feedback 240 to query vector 
generator 222. In one implementation, each of the potentially relevant images 238 
is displayed to the user at a client device along with a corresponding graphical 
"degree of relevance" slider. The user is able to shde the slider along a shde bar 
ranging from, for example, "Not Relevant" to "Highly Relevant". Each location 
along the slide bar that the slider can be positioned at by the user has a 
corresponding value that is returned to the generator 222 and comparator 224 and 
incorporated into their processes as discussed in more detail below. In one 
implementation, if the user provides no feedback, then a default relevancy 
feedback is assigned to the image (e.g., equivalent to "no opinion"). Alternatively, 
other user interface mechanisms may be used to receive user feedback, such as 
radio buttons corresponding to multiple different relevancy feedbacks (e.g.. Highly 
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Relevant, Relevant, No Opinion, Irrelevant, and Highly Irrelevant), verbal 
feedback (e.g., via speech recognition), etc. 

The relevance feedback is used by query vector generator 222 to generate a 
new query vector and comparator 224 to identify a new set of potentially relevant 
images. The user relevance feedback 240 can be numeric values that are directly 
used by generator 222 and comparator 224, such as: an integer or real value from 
zero to ten; an integer or real value from negative five to positive five; values 
corresponding to highly relevant, somewhat relevant, no opinion, somewhat 
irrelevant, and highly irrelevant of 7, 3, 0, -3, and -7, respectively. Alternatively, 
the user relevance feedback 240 can be an indication in some other format (e.g., 
the text or encoding of "Highly Relevant") and converted to a useable numeric 
value by generator 222, comparator 224, and/or another component (not 
illustrated). 

The second set of potentially relevant images displayed to the user is 
determined by comparator 224 incorporating the relevance feedback 240 received 
from the user into the comparison process. This process can be repeated any 
number of times, with the feedback provided each time being used to further refine 
the image retrieval process. 

Note that the components illustrated in architecture 220 may be distributed 
across multiple devices. For example, low-level features 228 may be stored 
locally at image server 102 of Fig. 1 (e.g., on a local hard drive) while images 226 
may be stored at one or more remote locations (e.g., accessed via network 106). 

The image retrieval process discussed herein refers to several different 
types of matrixes, including diagonal matrixes, fiill matrixes, and the identity 
matrix. A diagonal matrix refers to a matrix that can have any value along the 
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diagonal, where the diagonal of a matrix B are the elements of the matrix at 
positions Bjj, and values not along the diagonal are zero. The identity matrix is a 
special case of the diagonal matrix where the elements of the matrix along the 
diagonal all have the value of one and all other elements in the matrix have a value 
of zero. A full matrix is a matrix in which any element can have any value. These 
different types of matrixes are well-known to those skilled in the art, and thus will 
not be discussed further except as they pertain to the present invention. 

The specific manner in which query vectors are generated, comparisons are 
made, and relevance feedback is incorporated into both of these processes will 
now be described. It is to be appreciated that these specific manners described are 
only examples of the processes and that various modifications can be made to the 
these descriptions. 

Each single image of the images 226 has multiple {I) corresponding low- 
level features in the features 228. As used herein, x^. refers to the f feature 
vector of the m'* image, so: 

where Ki is the length of the feature vector x^. . 

A query vector is generated as necessary for each of the low-level feature 
spaces. The query vector is initially generated by extracting the low-level feature 
elements in each of the feature spaces from the initial selection 232. The query 
vector can be subsequently modified by the relevance feedback 240, as discussed 
in more detail below. The query vector in a feature space / is: 
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To compare the query vector {q. ) and a corresponding feature vector of an 
image m (x^. ), the distance between the two vectors is determined. A wide 
variety of different distance metrics can be used, and in one implementation the 
generalized Euclidean distance is used. The generalized Euclidean distance 
between the two vectors, referred to as g^. , is calculated as follows: 
_> ^ — > 

where is a matrix that both optionally transforms the low-level feature space 
into a higher level feature space and then assigns weights to each feature element 
in the higher level feature space. When sufficient data is available to perform the 
transformation, the low-level feature space is transformed into a higher level 
feature space that better models user desired high-level concepts. 
The matrix W- can be decomposed as follows: 

W. = F^AR 

where P- is an orthonormal matrix consisting of the eigen vectors of W^^ and A. is 
a diagonal matrix whose diagonal elements are the eigen values of W. . Thus, the 
calculation to determine the distance g^- can be rewritten as: 

-> ^ ^ 

gn.i = (Pi (^i - ^mi )f^i (Pi (Qi - ^rni )) 

where the low-level feature space is transformed into the higher level feature space 
by the mapping matrix and then weights are assigned to the feature elements of 
the new feature space by the weighting matrix A . . 
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However, in some situations there may be insufficient data to reliably 
perform the transformation into the higher level feature space. In such situations, 
the matrix W. is simply the weighting matrix A., so g^- can be rewritten as: 

Typically, each of multiple (7) low-level feature vectors of images in the 
database is compared to a corresponding query vector and the individual distances 
between these vectors determined. Once all of the / low-level feature vectors have 
been compared to the corresponding query vectors and distances determined, these 
distances are combined to generate an overall distance , which is defined as 
follows: 

where UQ is a function that combines the individual distances g^- to form the 
overall distance . Thus, a hierarchical approach is taken to determining how 
closely two images match: first individual distances between the feature vectors 
and the query vectors are determined, and then these individual distances are 
combined. 

The function UQ can be any of a variety of different combinatorial 
functions. In one implementation, the function U() is a weighted summation of 
the individual distances, resulting in: 

The feature vectors of the individual images (x^- ) are known (they are features 
228). The additional values needed to solve for the overall distance d^ are: the 
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weights (m,.) of each individual feature distance, the query vector (q.) for each 
feature, and the transformation matrix (W^) for each feature. For the first 
comparison (before any relevance feedback 240 is received), each query vector 
(?,.) is simply the corresponding extracted feature elements of the initial selection 
232, the weights («, ) of each individual distance are the same (e.g., a value of l/I, 
where / is the number of features used), and each transformation matrix (R^ ) is 
the identity matrix. The determination of these individual values based on 
relevance feedback is discussed in more detail below. 

Alternatively, the generalized Euclidean distance could also be used to 
compute , as follows: 

d =s . Us ■ 

where U is an ( / x / ) full matrix. 

The overall distance d^ is thus calculated for each image 226. 
Alternatively, the overall distance d^ may be calculated for only a subset of 
images 226. Which subset of images 226 to use can be identified in any of a 
variety of manners, such as using well-known multi-dimensional indexing 
techniques (e.g., R-tree or R*-tree). 

A number of images 226 having the smallest distance d„ are then selected 
as potentially relevant images to be presented to a user. The number of images 
226 can vary, and in one implementation is determined empirically based on both 
the size of display devices typically being used to view the images and the size of 
the images themselves. In one implementation, twenty images are returned as 
potentially relevant. 
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User relevance feedback 240 identifies degrees of relevance for one or 
more of the potentially relevant images 238 (that is, a value indicating how 
relevant each of one or more of the images 238 is). A user may indicate that only 
selected ones of the images 238 are relevant, and user relevance feedback 240 
identify degrees of relevance for only those selected images. Alternatively, user 
relevance feedback 240 may identify degrees of relevance for all images 238, such 
as by assigning a default value to those images for which the user did not assign a 
relevancy. These default values (and corresponding image features) can then be 
ignored by query vector generator 222 and comparator 224 (e.g., dropped fi-om 
relevance feedback 240), or alternatively treated as user input feedback and used 
by vector generator 222 and comparator 224 when generating new values. 

Once relevance feedback 240 is received, query vector generator 222 
generates new query vectors 234. The new query vectors are referred to as g*, 
and are defined as follows: 

«=i 

where N represents the number of potentially relevant images for which the user 
input relevance feedback (e.g., non-default relevance values were returned), which 
can be less than the number of potentially relevant images that were displayed to 
the user (N may also be referred to as the number of training samples); ;r„ 
represents the degree of relevance of image n as indicated by the relevance 
feedback fi-om the user (that is, a degree of relevance value associated with the 
relevance mdicated by the user), ;r represents a (1 x TV) vector of the individual 
^„ values, and Z,. represents a training sample matrix for feature / that is 
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obtained by stacking the N training vectors (x^. ) into a matrix, and resulting in an 
(N X K;) matrix. 

Alternatively, A'' (both here and elsewhere in this discussion) may 
represent the number of potentially relevant images for which relevance feedback 
was received regardless of the source (e.g., including both user-input feedback and 
default relevance values). 

The process of presenting potentially relevant images to a user and 
receiving relevance feedback for at least portions of that set of potentially relevant 
images can be repeated multiple times. The results of each set of feedback can be 
saved and used for determining subsequent query vectors (as well as the weights 
(m,.) of each individual distance and each transformation matrix (W.)) in the 
process, or alternatively only a certain number of preceding sets of feedback may 
be used. For example, if three sets of twenty images each are presented to a user 
and relevance feedback returned for each image of the three sets, then to generate 
the fourth set the feedback from all sixty images may be used. Alternatively, only 
the feedback from the most recent set of twenty images may be used (or the two 
most recent sets, etc.). 

Comparator 224 also receives relevance feedback 240 and uses relevance 
feedback 240 to generate a new value for , which is referred to as W^*. The 
value of Wi * is either a full matrix or a diagonal matrix. When the number of 
potentially relevant images for which the user input relevance feedback (A^) is less 
than the length of the feature vector (K.), the value of W.* sisa. full matrix cannot 
be calculated (and is difficult to reliably estimate, if possible at all). Thus, in 
situations where N kK^, W;* isa diagonal matrix; otherwise W.* isa. full matrix. 
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To generate the full matrix, W^* is calculated as follows: 

j_ 

W* = (det(C.)f-C7' 

where det(C,.) is the matrix determinant of C,., and Q is the (K^ x K^) weighted 
covariance matrix of X^ . In other words, 

N 

H^n(x„,r-qir)(x„i,-qis) 

N 

where r is the row index of the matrix C,. and ranges from 1 to K/, s is the 
column index of the matrix C,. and ranges from 1 to Z,., N represents the number 
of potentially relevant images for which the user input relevance feedback, ;r„ 
represents the degree of relevance of image n, refers to the element of the 
feature vector for feature i of image n, q.^ refers to the element of the query 
vector for feature /, x„,., refers to the 5* element of the feature vector for feature i 

th * Vi 

of the n ) image, and refers to the element of the query vector for feature i. 

To generate the diagonal matrix, each diagonal element of the matrix is 
calculated as follows: 

1 

w = — 

'kk _ 

where w^^ is the element of matrix and cr^ is the standard deviation of the 
sequence of x,.^ 's, and where each x^^ is the A;* element of feature /. 

It should be noted that the determination of whether Wi is to be a full 
matrix or a diagonal matrix is done on a per-image basis as well as a per-feature 
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basis for each image. Thus, depending on the length of each feature vector, Wi 
may be different types of matrixes for different features. 

It should also be noted that in situations where is a diagonal matrix, the 
distance (g„.) between a query vector (^. ) and a feature vector (x^. ) is based on 
weighting the feature elements but not transforming the feature elements to a 
higher level feature space. This is because there is an insufficient number of 
training samples to reliably perform the transformation. However, in situations 
where is a full matrix, the distance (g^.) between a query vector (^) and a 
feature vector (x„.) is based on both transforming the low-level features to a 
higher level feature space and weighting the transformed feature elements. 

Once relevance feedback 240 is received, comparator 224 also generates a 
new value for w,. , which is referred to as w,. *, and is calculated as follows: 




where 

N 
n=\ 

where N represents the number of potentially relevant images for which the user 
input relevance feedback, w„ represents the degree of relevance of image n, and 
Sni (gmi as discussed above) represents the distance between the previous query 
vector (9,. ) and the feature vector ). 

Fig. 4 is a flowchart illustrating an exemplary process, from the perspective 
of a client, for using relevance feedback to retrieve images. The process of Fig. 4 
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is carried out by a client 108 of Fig. 1, and can be implemented in software. Fig. 4 
is discussed with reference to components in Figs. 1 and 3. 

First, initial search criteria (e.g., an image) is entered by the user (act 260). 
The initial search criteria is used by image server 102 to identify potentially 
relevant images 238 which are received (from server 102) and rendered at client 
108 (act 262) as the initial search results. The client then receives an indication 
from the user as to whether the search resuhs are satisfactory. This indication can 
be direct (e.g., selection of an on-screen button indicating that the resuhs are 
satisfactory or to stop the retrieval process) or indirect (e.g., input of relevance 
feedback indicating that one or more of the images is not relevant). If the search 
results are satisfactory, then the process ends (act 266). 

However, if the search resuhs are not satisfactory, then the relevance of the 
search results is identified (act 268). The relevance of one or more images in the 
search results is identified by user feedback (e.g., user selection of one of multiple 
options indicating how relevant the image is). A new search request that includes 
the relevance feedback regarding the search results is then submitted to server 102 
(act 270). In response to the search request, the server 102 generates new search 
results (based in part on the relevance feedback), which are received by client 108 
(act 272). The process then returns to act 264, allowing for additional user 
relevance feedback as needed. 

Fig. 5 is a flowchart illustrating an exemplary process, from the perspective 
of an image server, for using relevance feedback to retrieve images. The process 
of Fig. 5 is carried out by an image server 102 of Fig. 1, and can be implemented 
in software. Fig. 5 is discussed with reference to components in Figs. 1 and 3. 
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To begin the image retrieval process, search criteria are received by image 
server 102 (act 282) as initial selection 232, in response to which generator 222 
generates multiple query vectors (act 284). Comparator 224 then maps the low- 
level feature vectors of images in image collection 104 to a higher level feature 
vector for each image and compares the higher level feature vectors to the query 
vector (act 286). The images that most closely match the query vectors (based on 
the comparison in act 286) are then identified (act 288), and forwarded to the 
requesting client 108 (act 290). Altematively, in some situations the mapping to 
the higher level feature space may not occur, and the comparison and 
identification may be performed based on the low-level feature space. 

Server 102 then receives user feedback from the requesting client 108 
regarding the relevance of one or more of the identified images (act 292). Upon 
receipt of this relevance feedback, generator 222 generates a new query vector 
based in part on the relevance feedback and comparator 224 uses the relevance 
feedback to generate a new transformation matrix and new feature distance 
weights (act 294). The process then returns to act 286, where the new mapping 
parameters and new query vector are used to identify new images for forwarding 
to the client. 

Conclusion 

Although the description above uses language that is specific to structural 
features and/or methodological acts, it is to be understood that the invention 
defined in the appended claims is not limited to the specific features or acts 
described. Rather, the specific features and acts are disclosed as exemplary forms 
of implementing the invention. 
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CLAIMS 



1. One or more computer readable media having stored thereon a 
plurality of instructions that, when executed by one or more processors, causes the 
one or more processors to perform acts including: 

receiving an initial image selection; 

generating a plurality of query vectors by extracting, for each query vector, 
one of a plurality of low-level features from the initial image selection; 

selecting a set of potentially relevant images based at least in part on 
distances between the plurality of query vectors and a plurality of feature vectors 
corresponding to low-level features of a plurality of images; 

receiving feedback regarding the relevance of one or more images of the set 
of potentially relevant images; 

generating a new plurality of query vectors based at least in part on the 
feedback; 

generating a weighting of feature elements based at least in part on the 
feedback; and 

selecting a new set of potentially relevant images based at least in part on 
both the weighting of feature elements and distances between the new plurality of 
query vectors and the plurality of feature vectors. 
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2. One or more computer readable media as recited in claim 1, wherein 
the selecting a new set of potentially relevant images comprises using a matrix in 
determining the distance between one of the new plurality of query vectors and 
one of the plurality of feature vectors, and further comprising dynamically 
selecting the matrix based on both a number of images in the set of potentially 
relevant images for which relevance feedback was input and a number of feature 
elements in the one feature vector. 

3. One or more computer readable media as recited in claim 2, wherein 
the dynamically selecting comprises using a diagonal matrix if the number of 
images in the set of potentially relevant images for which relevance feedback was 
input is less than the number of feature elements in the one feature vector, and 
otherwise using a full matrix. 

4. One or more computer readable media as recited in claim 2, wherein 
the dynamically selecting comprises: 

if the number of images in the set of potentially relevant images for which 
relevance feedback was input is not less than the number of feature elements in the 
one feature vector, then using one matrix that transforms the query vector and the 
one feature vector to a higher-level feature space and then using another matrix 
that assigns a weight to each element of the transformed query vector and the 
transformed feature vector; and 

if the number of images in the set of potentially relevant images is less than 
the number of feature elements in the one feature vector, then using a matrix that 
assigns a weight to each element of the query vector and the one feature vector. 
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5. One or more computer readable media as recited in claim 2, wherein 
X represents an image matrix that is generated by stacking N feature vectors 
corresponding to the set of potentially relevant images for which relevance 
feedback was received and resulting in an (A^ x K) matrix, C represents a 
weighted covariance matrix of X, det(C) represents the matrix determinant of C, 
and the matrix comprises a full matrix (Jf * ) that is generated as follows: 

_i_ 

?F* = (det(C))^C"'. 

6. One or more computer readable media as recited in claim 2, wherein 
^tt represents the ^A:* element of matrix W , represents the yt* feature element, 
0-4 represents the standard deviation of the sequence of 's, the matrix comprises 
a diagonal matrix with each diagonal element (w^ ) being generated as follows: 

1 
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7. One or more computer readable media as recited in claim 1, wherein 
N represents the number of images in the set of potentially relevant images for 
which relevance feedback has been received, tv^ represents the relevance of image 
n in the set of images, tt represents a transposition of a vector generated by 
concatenating the individual ;r„ values, and X represents an image matrix that is 
generated by stacking N training vectors corresponding to the set of potentially 



relevant images into a matrix, and wherein each new query vector ( ^ ) of the new 
plurality of query vectors is generated as follows: 



8. One or more computer readable media as recited in claim 1, wherein 
represents a summation, over the images in the set of potentially relevant 
images, of a product of a relevance of the image and a distance between the query 
vector and the feature vector, and wherein the selecting a new set of potentially 
relevant images comprises combining, for each image, a weighted distance 
between the plurality of query vectors and the plurality of feature vectors, and 
wherein the weight (u^) for each of a plurality (/) of distances between a query 
vector and a corresponding feature vector is calculated as: 




N 




9. 



One or more computer readable media as recited in claim 1, wherein 



the receiving feedback comprises receiving feedback from a user. 
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10. One or more computer readable media as recited in claim 1 , wherein 
the low-level features include: a color moments feature, a wavelet based texture 
feature, and a water-fill edge feature. 

11. A method of selecting between two types of matrixes to be used to 
weight, based on relevance feedback, a plurality of feature elements for image 
retrieval, the method comprising: 

selecting one of the two types of matrixes based on both a number of 
previously retrieved relevant images and a length of a feature vector including the 
plurality of feature elements. 

12. A method as recited in claim 11, wherein the selecting comprises 
selecting one of the two types of matrixes based on both a number of previously 
retrieved potentially relevant images which were identified by a user as being 
relevant, and the length of the feature vector including the plurality of feature 
elements. 

13. A method as recited in claim 11, wherein the plurality of feature 
elements are all elements of the same feature. 

14. A method as recited in claim 11, wherein the selecting comprises 
using a first type of matrix if the number of retrieved relevant images is less than 
the length of the feature vector, and otherwise using a second type of matrix. 
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15. A method as recited in claim 14, wherein the first type of matrix 
comprises a diagonal matrix and wherein the second type of matrix comprises a 
full matrix, 

16. A method as recited in claim 11, wherein the selecting comprises 
using a first type of matrix if the length of the feature vector exceeds the number 
of retrieved relevant images by at least a threshold amount, and otherwise using a 
second type of matrix. 

17. A method as recited in claim 16, wherein the first type of matrix 
comprises a full matrix and the second type of matrix comprises a diagonal matrix. 

18. One or more computer readable media including a computer 
program that is executable by a processor to perform the method recited in claim 
11. 

19. One or more computer readable media having stored thereon a 
plurality of instructions that, when executed by one or more processors, causes the 
one or more processors to perform acts including: 

comparing, for each of a plurality of images, a plurality of feature elements 
from a query vector to a plurality of feature elements from a feature vector 
corresponding to the image; 

identifying a number of potentially relevant images based on the 
comparing; 
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receiving user feedback regarding the relevancy of one or more of the 
potentially relevant images; 

re-comparing, for each of the plurality of images^ the plurality of feature 
elements from the query vector to the plurality of feature elements from the 
feature vector, including using a matrix to compare the feature elements and 
dynamically selecting a type of matrix to use based on both the user feedback and 
the number of the plurality of feature elements; 

identifying a new set of potentially relevant images based on the re- 
comparing; and 

presenting the new set of potentially relevant images to the user. 

20. One or more computer readable media as recited in claim 19, 
wherein the re-comparing comprises dynamically selecting the type of matrix to 
use based on both a number of the potentially relevant images for which user 
feedback has been received and the number of the plurality of feature elements. 

21. One or more computer readable media as recited in claim 19, 
wherein the dynamically weighting comprises using a first type of matrix if the 
number of retrieved relevant images is less than the length of the feature vector, 
and otherwise using a second type of matrix. 

22. One or more computer readable media as recited in 21, wherein the 
first type of matrix comprises a diagonal matrix and the second type of matrix 
comprises a full matrix. 
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23. A method comprising: 

generating a query vector corresponding to a feature of one image; 
identifying a feature vector corresponding to the feature of another image; 
identifying a number of training samples for which relevance feedback has 
been received; 

if the number of training samples either equals or exceeds a threshold 
amount, then determining a distance between the query vector and the feature 
vector including transforming the query vector and the feature vector to a higher- 
level feature space and then assigning a weight to each element of the transformed 
query vector and the transformed feature vector; and 

if the number of training samples does not exceed the threshold amount, 
then determining the distance between the query vector and the feature vector 
including assigning a weight to each element of the query vector and the feature 
vector. 

24. A method as recited in claim 23, wherein the feature vector includes 
a plurality of feature elements and wherein the threshold amount comprises the 
number of feature elements in the feature vector. 

25. A method as recited in claim 23, wherein if the number of training 
samples either equals or exceeds the threshold amount, then determining the 
distance (g ), where P is a mapping matrix, q is the query vector, x is the feature 
vector, and A is a weighting matrix, as: 

g = (P(J-x)/A(P(J-x)). 
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26, A method as recited in claim 23, wherein if the number of training 

samples does not exceed the threshold amount, then determining the distance {g), 

-> -> 
where q is the query vector, x is the feature vector, and A is a weighting matrix, 

as: 

->■ 

g = {q~xy k{q-x). 

11. A method as recited in claim 23, further comprising: 

repeating the generating, identifying of the feature vector, identifying of the 

number of training samples, and the determining for each of a plurality of features; 

and 

identifying how closely the image and the other image match each other by 
combining the distances between the query vectors and the feature vectors for the 
plurality of features. 

28. A method as recited in claim 27, wherein the identifying comprises 
calculating a weighted summation of each of the individual distances for each of 
the plurality of features. 

29. One or more computer readable media including a computer 
program that is executable by a processor to perform the method recited in claim 
23. 

30. A system comprising: 

a query vector generator to generate a query vector corresponding to a 
feature of one image; 
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a comparator, coupled to the query vector generator, to, 

identify a feature vector corresponding to the feature of another 

image, 

identify a number of training samples for which relevance feedback 
has been received, 

if the number of training samples either equals or exceeds a 
threshold amount, then to determine a distance between the query vector 
and the feature vector including transforming the query vector and the 
feature vector to a higher-level feature space and then assigning a weight to 
each element of the transformed query vector and the transformed feature 
vector, and 

if the number of training samples does not exceed the threshold 
amount, then to determine the distance between the query vector and the 
feature vector including assigning a weight to each element of the query 
vector and the feature vector. 

31. A method comprising: 

for one of a plurality of images and each of a plurality of features, 

generating, based on the set of search criteria, a query vector for the 
feature, 

identifying a feature vector, corresponding to the image, for the 
feature, and 

determining how closely the feature vector matches the query vector; 
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determining how closely the image matches the set of search criteria based 
on how closely, for the pluraUty of features, the feature vectors match the query 
vectors. 

32. A method as recited in claim 31, wherein generating the query 
vector comprises generating the query vector based at least in part on user 
relevance feedback regarding how relevant images previously displayed to a user 
were. 

33* A method as recited in claim 31, wherein identifying the feature 
vector comprises: 

identifying a low-level feature vector corresponding to the feature; and 
mapping the low-level feature vector to a higher level feature space. 

34, A method as recited in claim 33, wherein the identifying the feature 
vector further comprises incorporating, into the mapping, relevance feedback. 

35, A method as recited in claim 31, wherein the initial search criteria 
comprises an image. 
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36. A method as recited in claim 31, wherein the determining how 
closely the feature vector matches the query vector comprises determining a 
distance between the feature vector and the query vector, and wherein the 
determining how closely the image matches the set of search criteria comprises 
calculating a weighted summation of each of the individual distances between the 
feature vectors and the query vectors, 

37. A method as recited in claim 36, wherein the calculating a weighted 
summation comprises calculating the weighted summation based at least in part on 
user relevance feedback regarding how relevant images previously displayed to a 
user were. 

38. One or more computer readable media including a computer 
program that is executable by a processor to perform the method recited in claim 
31. 

39. One or more computer readable media having stored thereon a 
plurality of instructions that, when executed by one or more processors, causes the 
one or more processors to perform acts including: 

identifying a plurality of query vectors for one image, each query vector 
corresponding to one of a plurality of features; 

identifying a plurality of feature vectors for the other image, each feature 
vector corresponding to one of the plurality of features; 

for each feature, determining a distance between the corresponding query 
vector and the corresponding feature vector; and 
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combining the distances to generate a value representing an overall distance 
between the two images. 

40. One or more computer readable media as recited in claim 39, 
wherein the identifying the plurality of query vectors comprises extracting the 
plurality of query vectors from the image. 

41. One or more computer readable media as recited in claim 39, 
wherein the identifying the plurality of query vectors comprises generating the 
plurality of query vectors based at least in part on user relevance feedback 
regarding how relevant images previously displayed to a user were. 

42. One or more computer readable media as recited in claim 39, 
wherein the determining the distance between the corresponding query vector and 
the corresponding feature vector includes incorporating, into the determining, user 
relevance feedback regarding how relevant images previously displayed to a user 
were. 

43. One or more computer readable media as recited in claim 39, 
wherein the combining the distances comprises calculating a weighted summation 
of each of the individual distances between the feature vectors and the query 
vectors. 



Lee & Hayes, PLLC 



37 



MSJ-610 PA TAPP DOC 



1 

2 
3 
4 
5 
6 
7 
8 
9 
10 

n 

12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 



44. One or more computer readable media as recited in claim 43, 
wherein the calculating a weighted summation comprises calculating the weighted 
summation based at least in part on user relevance feedback regarding how 
relevant images previously displayed to a user were. 

45. A system comprising: 

a query vector generator to identify a query vector, for each of a plurality of 
features, corresponding to one image; 

a comparator, coupled to the query vector generator, to, 

identify, for each of a plurality of features of another image, a 
feature vector for the feature, 

determine, for each of the plurality of features, how closely the 
feature vector matches the query vector, and 

determine how closely the image matches the other based on how 
closely, for the plurality of features, the feature vectors match the query 
vectors. 

46. A method of generating a query vector to compare to a feature 
vector of another image, the method comprising: 

receiving feedback regarding the relevance of each image of a set of 
images; 

wherein iV^ represents the number of images in the set of images for which 
user relevance feedback has been received, represents the relevance of image n 
in the set of images, n represents a transposition of a vector generated by 
concatenating the individual tt^ values, and X represents an image matrix that is 
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generated by stacking N training vectors corresponding to the set of images into a 
matrix; and 

generating a query vector (q) corresponding to one of a plurality of 
features as follows: 

^ n X 
• 

n=\ 

47. One or more computer readable media including a computer 
program that is executable by a processor to perform the method recited in claim 
46. 

48. A method of generating a weight to apply to distances between 
query vectors and feature vectors when combining the distances, the method 
comprising: 

receiving feedback regarding the relevance of each image of a set of 
images; 

wherein f. represents a summation, over the images in the set of images, of 
a product of a relevance of the image and a distance between the query vector and 
the feature vector; and 

generating a weight (u-) for each of a plurality (/) of distances between a 
query vector corresponding to one of a plurality (/) of features and a feature 
vector corresponding to the one of the plurality ( / ) of features as: 

j=i V Ji 
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49. One or more computer readable media including a computer 
program that is executable by a processor to perform the method recited in claim 
48. 



50, A system comprising: 
a client device; 

a collection of a plurality of images; 

an image server, coupled to the client device and the collection of a 
plurality of images, the image server to receive image retrieval requests from the 
client device and to, 

receive an initial image selection from the client device, 
generate a plurality of query vectors by extracting, for each query 
vector, one of a plurality of low-level features from the initial image 
selection, 

select a set of potentially relevant images based at least in part on 
distances between the plurality of query vectors and a plurality of feature 
vectors corresponding to low-level features of a plurality of images, 

receive feedback regarding the relevance of one or more images of 
the set of potentially relevant images, 

generate a new plurality of query vectors based at least in part on the 
feedback, 

generate a weighting of feature elements based at least in part on the 
feedback, and 
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select a new set of potentially relevant images based at least in part 
on both the weighting of feature elements and distances between the new 
plurality of query vectors and the plurality of feature vectors. 
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ABSTRACT 

An improved image retrieval process based on relevance feedback uses a 
hierarchical (per-feature) approach in comparing images. Multiple query vectors 
are generated for an initial image by extracting multiple low-level features from 
the initial image. When determining how closely a particular image in an image 
collection matches the initial image, a distance is calculated between the query 
vectors and corresponding low-level feature vectors extracted from the particular 
image. Once these individual distances are calculated, they are combined to 
generate an overall distance that represents how closely the two images match. 
According to other aspects, relevancy feedback received regarding previously 
retrieved images is used during the query vector generation and the distance 
determination to influence which images are subsequently retrieved. 
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Enter Initial Search Criteria 
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Search Results 
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Identify Relevance Of Search 
Results 



Submit New Search Request 
Including Relevance Of Search 
Results 
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Receive Search Criteria 
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Generate Query Vectors 



Map Image Low-Level Feature Vectors To 
Higher Level Feature Space And 
Compare To Query Vectors 
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Identify Images That Most Closely Match 
Query Vectors 
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FoHA/ard Identified Images To Client 
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Receive User Feedback Regarding 
Relevance Of Identified Images 



Use Relevance Feedback To Generate 
New Query Vector, Transformation Matrix, 
And Feature Distance Weights 
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